Automatic Model Refinement - with an application to tagging
نویسندگان
چکیده
Statistical NLP models usually only consider coarse information and very restricted context to make the estimation of parameters feasible. To reduce the modeling error introduced by a simplified probabilistic model, the Classitication and Regression Tree (CART) method was adopted in this paper to select more discriminative features for automatic model refinement. Because the features are adopted dependently during splitting the classification tree in CART, the number of training data in each terminal node is small, which makes the labeling process of terminal nodes not robust. This over-tuning phenomenon cannot be completely removed by crossvalidation process (i.e., pruning process). A probabilistic classification model based on the selected discriminative features is thtls proposed to use the training data more efficiently. In tagging the Brown Corpus, our probabilistic classification model reduces the error rate of the top 10 error dominant words from 5.71% to 4.35%, which shows 23.82% improvement over the un-
منابع مشابه
Automatic Refinement of Linguistic Rules for Tagging
This paper describes an approach to POS tagging based on the automatic refinement of manually written linguistic tagging rules. The refinement was carried out by means of a learning algorithm based on decision trees. The tagging rules work on ambiguity classes: each input word undergoes a morphological analysis and a set of possible tags is returned. The set of tags determines the ambiguity cla...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملModelface: an application programming interface (API) for homology modeling studies using Modeller software
An interactive application, Modelface, was presented for Modeller software based on windows platform. The application is able to run all steps of homology modeling including pdb to fasta generation, running clustal, model building and loop refinement. Other modules of modeler including energy calculation, energy minimization and the ability to make single point mutations in the PDB structures a...
متن کامل